Symbolic Automata: The Toolkit
نویسندگان
چکیده
The symbolic automata toolkit lifts classical automata analysis to work modulo rich alphabet theories. It uses the power of stateof-the-art constraint solvers for automata analysis that is both expressive and efficient, even for automata over large finite alphabets. The toolkit supports analysis of finite symbolic automata and transducers over strings. It also handles transducers with registers. Constraint solving is used when composing and minimizing automata, and a much deeper and powerful integration is also obtained by internalizing automata as theories. The toolkit, freely available from Microsoft Research, has recently been used in the context of web security for analysis of potentially malicious data over Unicode characters. Introduction. The distinguishing feature of the toolkit is the use and operations with symbolic labels. This is unlike classical automata algorithms that mostly work assuming a finite alphabet. Adtantages of a symbolic representation are examined in [4], where it is shown that the symbolic algorithms consistently outperform classical algorithms (often by orders of magnitude) when alphabets are large. Moreover, symbolic automata can also work with infinite alphabets. Typical alphabet theories can be arithmetic (over integers, rationals, bit-vectors), algebraic data-types (for tuples, lists, trees, finite enumerations), and arrays. Tuples are used for handling alphabets that are cross-products of multiple sorts. In the following we describe the core components and functionality of the tool. The main components are Automaton〈T 〉, basic automata operations modulo a Boolean algebra T ; SFA〈T 〉, symbolic finite automata as theories modulo T ; and SFT〈T 〉, symbolic finite transducers as theories modulo T . We illustrate the tool’s API using code samples from the distribution. Automaton〈T 〉. The main building block of the toolkit, that is also defined as a corresponding generic class, is a (symbolic) automaton over T : Automaton〈T 〉. The type T is assumed to be equipped with effective Boolean operations over T : ∧, ∨, ¬, ⊥, is⊥ that satisfy the standard axioms of Boolean algebras, where is⊥(φ) checks if a term φ is false (thus, to check if φ is true, check is⊥(¬φ)). The main operations over Automaton〈T 〉 are ∩ (intersection), ∪ (union) { (complementation), A ≡ ∅ (emptiness check). As an example of a simple symbolic operation consider products: when A,B are of type Automaton〈T 〉, then A ∩ B has the transitions 〈(p, q), φ∧ψ, (p′, q′)〉 for each transition 〈p, φ, p′〉 ∈ A, and 1 The binary release is available from http://research.microsoft.com/automata. 〈q, ψ, q′〉 ∈ B. Infeasible and unreachable transitions are pruned by using the is⊥ tester. Note that Automaton〈T 〉 is also a Boolean algebra (using the operations ∩,∪, {,≡ ∅). Consequently, the tool supports building and analyzing nested automata Automaton〈Automaton〈T 〉〉. The tool provides a Boolean algebra solver CharSetSolver that uses specialized BDDs (see [4]) of type CharSet. This solver is used to efficiently analyze .Net regexes with Unicode character encoding. The following code snippet illustrates its use, as well as some other features like visualization. CharSetSolver solver = new CharSetSolver(CharacterEncoding.Unicode); // charset solver string a = @"^[A-Za-z0-9]+@(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+$"; // .Net regex string b = @"^\d.*$"; // .Net regex Automaton A = solver.Convert(a); // create the equivalent automata Automaton B = solver.Convert(b); Automaton C = A.Minus(B, solver); // construct the difference var M = C.Determinize(solver).Minimize(solver); // determinize then minimize the automaton solver.ShowGraph(M, "M.dgml"); // save and visualize string s = solver.GenerateMember(M); //generate some member, e.g. "[email protected]" The resulting graph from line 8 is shown below.
منابع مشابه
Lecture Notes in Computer Science 7385
In the last decade, advances in satisfiability-modulo-theories (SMT) solvers have powered a new generation of software tools for verification and testing. These tools transform various program analysis problems into the problem of satisfiability of formulas in propositional or first-order logic, where they are discharged by SMT solvers, such as Z3 from Microsoft Research. This paper briefly sum...
متن کاملTheoretical Aspects of Symbolic Automata
Symbolic finite automata extend classical automata by allowing infinite alphabets given by Boolean algebras and having transitions labeled by predicates over such algebras. Symbolic automata have been intensively studied recently and they have proven useful in several applications. We study some theoretical aspects of symbolic automata. Especially, we study minterms of symbolic automata, that i...
متن کاملForward Bisimulations for Nondeterministic Symbolic Finite Automata
Symbolic automata allow transitions to carry predicates over rich alphabet theories, such as linear arithmetic, and therefore extend classic automata to operate over infinite alphabets, such as the set of rational numbers. Existing automata algorithms rely on the alphabet being finite, and generalizing them to the symbolic setting is not a trivial task. In our earlier work, we proposed new tech...
متن کاملTiburon: A Weighted Tree Automata Toolkit
The availability of weighted finite-state string automata toolkits made possible great advances in natural language processing. However, recent advances in syntax-based NLP model design are unsuitable for these toolkits. To combat this problem, we introduce a weighted finite-state tree automata toolkit, which incorporates recent developments in weighted tree automata theory and is useful for na...
متن کاملSymbolic tree automata
We introduce symbolic tree automata as a generalization of finite tree automata with a parametric alphabet over any given background theory. We show that symbolic tree automata are closed under Boolean operations, and that the operations are effectively uniform in the given alphabet theory. This generalizes the corresponding classical properties known for finite tree automata.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012